Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 #862

Merged
merged 2 commits into from
Jan 28, 2025

Conversation

wukaixingxp
Copy link
Contributor

@wukaixingxp wukaixingxp commented Jan 21, 2025

What does this PR do?

This PR fix meta_eval after refactor by setting the correct path and update MATH dataset URL. Split 3.2 MMLU task into meta_mmlu_pretrain and meta_mmlu_instruct, tested below:

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • add new meta_mmlu_instruct for 3.2
vllm (pretrained=meta-llama/Llama-3.2-3B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.4,data_parallel_size=1,max_model_len=8192,add_bos_token=True,seed=42), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|        Tasks        |Version|   Filter   |n-shot|  Metric   |   |Value |   |Stderr|
|---------------------|-------|------------|-----:|-----------|---|-----:|---|-----:|
|meta_instruct        |    N/A|            |      |           |   |      |   |      |
| - meta_gpqa         |      1|strict-match|     0|exact_match|↑  |0.3326|±  |0.0223|
| - meta_math         |      1|none        |     0|exact_match|↑  |0.4514|±  |0.0070|
| - meta_mmlu_instruct|      1|strict-match|     0|exact_match|↑  |0.6368|±  |0.0041|
  • test on 3b meta_mmlu_pretrain
2025-01-28:14:28:57,156 INFO     [evaluation_tracker.py:287] Saving per-sample results for: meta_mmlu_pretrain
vllm (pretrained=meta-llama/Llama-3.2-3B,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.4,data_parallel_size=1,max_model_len=8192,add_bos_token=True,seed=42), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|        Tasks        |Version|Filter|n-shot| Metric |   |Value|   |Stderr|
|---------------------|-------|------|-----:|--------|---|----:|---|-----:|
|meta_pretrain        |    N/A|      |      |        |   |     |   |      |
| - meta_mmlu_pretrain|      1|none  |     0|acc     |↑  |0.566|±  |0.0042|
|                     |       |none  |     0|acc_norm|↑  |0.566|±  |0.0042|

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@wukaixingxp
Copy link
Contributor Author

add mmlu_instruct for 3.2

vllm (pretrained=meta-llama/Llama-3.2-3B-Instruct,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.4,data_parallel_size=1,max_model_len=8192,add_bos_token=True,seed=42), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|      Tasks       |Version|   Filter   |n-shot|  Metric   |   |Value |   |Stderr|
|------------------|------:|------------|-----:|-----------|---|-----:|---|-----:|
|meta_mmlu_instruct|      1|strict-match|     0|exact_match|↑  |0.6368|±  |0.0041|

@wukaixingxp wukaixingxp marked this pull request as ready for review January 28, 2025 22:33
@wukaixingxp wukaixingxp requested a review from init27 January 28, 2025 22:33
@wukaixingxp wukaixingxp changed the title fix meta_eval after refactor fix meta_eval after refactor and add new meta_mmlu_instruct task for 3.2 Jan 28, 2025
@init27 init27 merged commit 6bfd034 into main Jan 28, 2025
4 checks passed
@wukaixingxp wukaixingxp self-assigned this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants